This article explores the architecture enabling AI chatbots to perform web searches, covering retrieval-augmented generation (RAG), vector databases, and the challenges of integrating search with LLMs.
This paper addresses the misalignment between traditional IR evaluation metrics and the requirements of modern Retrieval-Augmented Generation (RAG) systems. It proposes a novel annotation schema and the UDCG metric to better evaluate retrieval quality for LLM consumers.
This article details the process of building a fast vector search system for a large legal dataset (Australian High Court decisions). It covers choosing embedding providers, performance benchmarks, using USearch and Isaacus embeddings, and the importance of API terms of service. It focuses on achieving speed and scalability while maintaining reasonable accuracy.
In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. We conduct comprehensive experiments on large-scale internal video recommendation datasets and demonstrate substantial improvements for retrieval compared to a heavily-optimized production model.
A blog post comparing when to use regular Google search versus LLMs for research, outlining the strengths and weaknesses of each. It details scenarios where search engines excel (facts, current events, specific sources) and where LLMs shine (analysis, synthesis, creative thinking). It also lists tasks LLMs struggle with, such as complex reasoning, real-time information, and fact verification.
This blog post details an experiment testing the ability of LLMs (Gemini, ChatGPT, Perplexity) to accurately retrieve and summarize recent blog posts from a specific URL (searchresearch1.blogspot.com). The author found significant issues with hallucinations and inaccuracies, even in models claiming live web access, highlighting the unreliability of LLMs for even simple research tasks.
This article introduces the pyramid search approach using Agentic Knowledge Distillation to address the limitations of traditional RAG strategies in document ingestion.
The pyramid structure allows for multi-level retrieval, including atomic insights, concepts, abstracts, and recollections. This structure mimics a knowledge graph but uses natural language, making it more efficient for LLMs to interact with.
**Knowledge Distillation Process**:
- **Conversion to Markdown**: Documents are converted to Markdown for better token efficiency and processing.
- **Atomic Insights Extraction**: Each page is processed using a two-page sliding window to generate a list of insights in simple sentences.
- **Concept Distillation**: Higher-level concepts are identified from the insights to reduce noise and preserve essential information.
- **Abstract Creation**: An LLM writes a comprehensive abstract for each document, capturing dense information efficiently.
- **Recollections/Memories**: Critical information useful across all tasks is stored at the top of the pyramid.
Re-ranking is integral to retrieval pipelines, but implementation methods vary. We introduce rerankers, a Python library offering a unified interface for common re-ranking approaches.
This article explores the limitations of position-based chunking in Retrieval Augmented Generation (RAG) systems and proposes semantic chunking as a better alternative for improved performance.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.